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1. Introduction 


The importance of computational chemistry in modern scientific research is well 
established. Continuous improvement in software and algorithms for the modeling of 
chemical interactions has transformed molecular modeling into a powerful tool for many 
current day research projects. From a medical perspective, one of the ultimate goals in 
computer-aided drug design (CADD) is the accurate prediction of ligand-binding affinities 
to a macromolecular target, which can facilitate and speed the routine identification of new 
candidates in early stage drug discovery projects (Gilson & Zhou, 2007; Hayes & Leonidas, 
2010). In particular, structure-based modeling provides an efficient pathway towards 
exploiting known three-dimensional structural data in the design and proposal of new 
molecules for experimental evaluation. Docking calculations are now widely used in high- 
throughput virtual screening of structurally diverse molecules from available compound 
libraries/databases against specific targets. Once initial “hits” or “lead” molecules are 
identified (normally low uM inhibitors), modification of their chemical features in the “lead 
optimization” phase can improve their binding affinities and fine-tune other desirable drug- 
like properties. However, docking calculations currently have limited success beyond the 
lead identification stage, where more accurate lower-throughput computational methods 
are needed. In this regard, the Molecular Mechanics/Generalized Born Surface Area (MM- 
GBSA) and Molecular Mechanics/Poisson-Boltzmann Surface Area (MM-PBSA) methods 
calculate binding free energies using molecular mechanics (forcefields) and continuum 
(implicit) solvation models (Kollman et al., 2000). They have been successfully applied across 
a range of targets and are implemented in software programs such as Amber (Case et al., 
2005), Delphi (Rocchia et al., 2001) and Schrödinger (Du et al., 2011). With a target 
readership from beginner to expert, the current chapter provides an extensive and critical 
overview of MM-GB(PB)SA methods and their applications. The theoretical foundation of 
the MM-GB(PB)SA method is first described. We then discuss key aspects which improve 
the accuracy of results, and highlight potential caveats due to the approximations inherent 
in the methods. The chapter concludes with a review of recent representative applications, 
which illustrate both successes and limitations. The emphasis of this chapter is on structure- 
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based drug design (SBDD) efforts, however, the methods have widespread applicability in 
other areas such as in supramolecular chemistry. 


2. Theoretical background — “Pathway” and “Endpoint” methods 


Low-throughput computational approaches for the calculation of ligand binding free 
energies can be divided into “pathway” and “endpoint” methods (Deng & Roux, 2009; 
Gilson & Zhou, 2007). In pathway methods, the system is converted from one state (e.g., the 
complex) to the other (e.g., the unbound protein/ligand). This can be achieved by 
introducing a set of finite or infinitesimal “alchemical” changes to the energy function (the 
Hamiltonian) of the system through free-energy perturbation (FEP) or thermodynamic 
integration (TI), respectively (Kollman, 1993; Straatsma & McCammon, 1992). The 
fundamentals of FEP and TI methods were introduced many decades ago by John Kirkwood 
(Kirkwood, 1935) and Robert Zwanzig (Zwanzig, 1954). In recent years, their use in the 
computation of absolute binding affinities has become feasible due to increases in 
computational power, the development of more accurate models of atomic interactions 
(Cornell et al., 1995; MacKerell et al., 1998; van Gunsteren et al., 1996), the clarification of the 
underlying theoretical framework and the introduction of methodological advances 
(Boresch et al., 2003; Bowers et al., 2006; Deng and Roux, 2009; Gilson et al., 1997; Gilson 
& Zhou, 2007; Lee & Olson, 2006; Mobley et al., 2007). Combined with atomistic molecular 
dynamics (MD) or Monte Carlo (MC) simulations in explicit water solvent models, they 
are arguably the most accurate methods for calculating absolute or relative ligand binding 
affinities. 


The “alchemical” computation of differences in binding affinities (rather than absolute 
affinities) among a set of related ligands for the same target protein is more accurate and 
technically simpler. A thermodynamic cycle illustrating the basic principles is shown in 
Figure 1 (Tembe & McCammon, 1984). The horizontal legs describe the experimentally 
accessible actual binding processes, with free energies AGpina(L1) and AGpina(L2). Since the 
free energy is a state function, the relative binding free energy AAGpina is exactly equal to the 
difference of the free energies in the horizontal or vertical legs: 


AAG ping = AGpina (L2) — AGping (L1) (1) 


= AG 1 $1.2) UG, ..(L1-4L 25 (2) 


complex 


The simulations follow the vertical steps (Eq. (2)) or unphysical processes, by simulations in 
water solution that gradually change the energy-function of the system from one “endpoint” 
to the other through a series of intermediate hybrid states.From Figure 1, this involves the 
stepwise “alchemical” transformation of ligand L1 to L2 both in its ‘free’ state (unbound) 
and in the bound complex, through gradual changes in the forcefield parameters describing 
the ligand interactions. This leads to the free energy changes AGpfree(L1—-L2) and 
AGcomplex(L1—>L2), respectively. Averaging over both transformation directions is often used 
to improve the free-energy estimates, although this is not always the case (Lu & Woolf, 
2007). These calculations can be accurate, if conducted with the appropriate care. An 
overview of current state-of-the-art methods for absolute and relative affinity calculations is 
in (Chodera et al., 2011). 
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AGbina( L2) 


L2 + Protein L2:Protein 


AGfree(LI=L2) AG comptex(L1>L2) 


L1 + Protein. ———_ L1:Protein 
AG pinal LI ) 


Fig. 1. Thermodynamic cycle linking the binding of two ligands L1 and 12 to a protein in 
solution. 


To conclude, the emerging implementation of biomolecular codes on GPU architectures 
(Harvey et al., 2009; Stone et al., 2011) and the development of simple free-energy protocols 
(Boresch & Bruckner, 2011) make atomistic methods of absolute or relative affinities very 
promising for larger-scale calculations in the near future. Nevertheless, at present they are 
still relatively time-consuming, and require considerable expertise and planning. They 
preclude the consideration of more than a few complexes per day on a dedicated CPU 
cluster with a few tens of nodes. A trade-off between computational expense and accuracy is 
therefore required when the goal is to investigate and compare the binding strengths of a 
structurally diverse and/or larger set of ligands via MD simulations. For this purpose, much 
less computationally demanding ,,endpoint” methods are often successfully applied, such as 
the „linear interaction energy” (LIE) (Aqvist et al., 2002) or the molecular mechanics - 
Poisson Boltzmann (MM-PBSA) (Kollman et al., 2000) and the related molecular mechanics - 
generalised Born (MM-GBSA) approximation (Gohlke et al., 2003). All these methods 
compute binding free energies along the horizontal legs of Figure 1, but use only models for 
the “endpoints” (bound and unbound states). The MM-PB(GB)SA methodology is now 
described in more detail. 


3. The MM-GB(PB)SA methodology 


Using MM-GBSA and MM-PBSA methods, relative binding affinities for a set of ligands to a 
given target can often be reproduced with good accuracy and considerable less 
computational effort compared to full-scale molecular dynamics FEP/TI simulations. 
Furthermore, free-energies can be decomposed into insightful interaction and desolvation 
components (Archontis et al., 2001; Hayes et al., 2011; Polydoridis et al., 2007). 


3.1 Thermodynamics & calculation framework 


In the MM-GB(PB)SA formulation, the binding free energy of a ligand (L) to a protein (P) to 
form the complex (PL) is obtained as the difference (Pearlman, 2005): 


AG, , =G(PL)-G(P)-G(L) 


bind = 


(3) 


The free energy of each of the three molecular systems P, L, and PL is given by the 
expression: 


G(X) = Eyy (X)+G a, (X)-TS(X) 


(4) 


www.intechopen.com 


174 Molecular Dynamics — Studies of Synthetic and Biological Macromolecules 


In Eq. (4), Emm is the total molecular mechanics energy of molecular system X in the gas 
phase, Gsoiv is a correction term (solvation free energy) accounting for the fact that X is 
surrounded by solvent, and S is the entropy of X. 


To apply the MM-GB(PB)SA formulation, a representative set of equilibrium conformations 
for the complex, free protein and free ligand are first obtained by atomistic MD simulations 
in explicit solvent. In this post-processing phase, the solvent is discarded and replaced by a 
dielectric continuum. Changes (A) in the individual terms (AEmm, AGgolv, -TAS) of Eq.(4) 
between the unbound states and the bound (complex) state are calculated, and contribute to 
the binding free energies according to Eq.(3). Computation of each of the terms in Eq. (4) is 
now described in more detail. 


Emm is the sum of the bonded (internal), and non-bonded electrostatic and van der Waals 
energies 


Emm = Ehonded + Egje + a (5) 


These energy contributions are computed from the atomic coordinates of the protein, ligand 
and complex using the (gas phase) molecular mechanics energy function (or forcefield). The 
solvation free energy term Gsoly contains both polar and non-polar contributions. The polar 
contributions are accounted for by the generalized Born, Poisson, or Poisson-Boltzmann 
model, and the non-polar are assumed proportional to the solvent-accessible surface area 
(SASA), c.f. Section 3.2: 


Gsolv = Gpg(aB) + Gsasa (6) 


Finally, the entropy S is decomposed into translational, rotational and vibrational 
contributions. The first two are computed by standard statistical-mechanical expressions, 
and the last is typically estimated from a normal-mode (harmonic or quasiharmonic) 
analysis (Brooks et al., 1995; Karplus & Kushick 1981; Tidor & Karplus 1994). In practice, 
current software implementations normally determine all three contributions to S as part of 
a normal-mode analysis. 


To improve the accuracy of the computed binding free energies, the various terms of Eq. (4) 
are averaged over multiple conformations or MD snapshots (typically a few hundred for the 
Emm and Ggoly contributions). Depending on the extent of conformational fluctuations in the 
system under consideration, the convergence into stable values may require relatively long 
(multi-ns) simulations. The computation of the entropy term, however, requires the 
extensive minimization of the trajectory conformations for the protein, ligand and complex 
to local minima on the potential energy surfaces, followed then by normal mode analysis. 
This procedure is costly and prevents the consideration of a large number of conformations; 
insufficient sampling can therefore sometimes an issue. To decrease the computational cost, 
the protein can be truncated beyond a certain cutoff distance and the system minimized 
using a distance-dependent dielectric, which simulates the deleted surroundings (Kongsted 
& Ryde, 2009). However, a large variation of the entropy term often results from these ‘free’ 
minimizations. Including a fixed buffer region (with water molecules) beyond the cut-off 
can lead to more stable entropy predictions (Kongsted & Ryde, 2009). 


The internal energy terms (Ebondea) of the protein and complex can be on the order of a few 
thousand kcal/mol, and can introduce large uncertainties in the computed binding free 
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energies. This is prevented in the “single-trajectory” approximation (Gohlke & Case, 2004; 
Page & Bates, 2006), which employs simulations of a single state (the complex) to generate 
conformations for all three states (complex, protein and ligand). For each MD conformation 
sampled, the resulting internal energy terms of the protein and ligand are identical in the 
bound and the unbound states and cancel exactly in Eq. (3). Hence, effectively only the 
protein-ligand (non-bonded) interaction energies of the Emm term in Eq. (5) contribute to 
AGpina. ‘Single-trajectory’ simulations significantly reduce computational effort and are 
generally sufficiently accurate for most applications. The downside of the approximation is 
that any explicit structural relaxation of the protein and ligand upon binding is ignored. 
Although charge reorganization can be partly taken into account implicitly by setting the 
protein/ligand (internal) dielectric constants (Section 3.2) to values larger than ein = 1-2 
(Archontis et al., 2001; Archontis & Simonson, 2005; Schutz & Warshel, 2001; Simonson, 
2003), the neglect of explicit structural relaxation may introduce errors depending on the 
system (Tamamis et al., 2010). Separate MD simulations for the complex, and unbound 
receptor and ligands may also be performed (the “three-trajectory” approximation) but 
require greater computational effort, although in theory should yield more accurate results. 
Indeed, Yang and co-workers have recently shown that including separate simulations for 
the ligand and accounting for the “ligand reorganization” free energy led to significant 
improvements in binding affinity predictions for a set of ligands targeting XIAP (Yang et al., 
2009). In certain cases, therefore, the added expense of separate simulations may be justified. 


3.2 Solvation models — GBSA and PBSA 


Proteins function usually inside aqueous solutions or in membrane environments, which are 
in the vicinity of an aqueous medium. The surrounding solvent can influence protein 
stability and function, ligand binding and protein-protein association. Since the solvent 
modifies in a non-trivial manner the intramolecular and intermolecular interactions, an 
accurate inclusion of solvent effects in biomolecular modeling and simulation is a 
challenging task. 


Currently, the most accurate treatment in molecular simulations is achieved by atomic- 
detail models that represent explicitly the biomolecule and its surrounding environment. 
Several water models are used successfully to describe the aqueous environment in 
atomistic simulations; examples include SPC (Berendsen et al., 1981), SPC/E (Berendsen et 
al., 1987), TIP3P and TIP4P (Jorgensen et al., 1983), and TIP5P (Mahoney & Jorgensen, 2000). 
In practice, the explicit inclusion of water leads to a considerable increase in both the size of 
the simulation system and the computational cost of the simulation itself. Furthermore, the 
computation of solvation or binding free energies requires an exhaustive sampling of the 
solvent degrees of freedom. A much less costly approach is to represent the solvent 
implicitly in the simulation, through the incorporation of additional “potential of mean 
force” terms (Roux, 2001; Roux & Simonson, 1999) in the gas-phase energy function (e.g., Eq. 
(7) below). These terms depend only on the atomic coordinates of the solute, and express the 
solute free energy for a given configuration, after the solvent degrees of freedom have been 
“integrated out” (Roux, 2001; Roux & Simonson, 1999). Thus, the simulation system has the 
same number of degrees of freedom as in the gas phase and there is no need for explicit 
sampling over solvent degrees of freedom. The MM-PB(GB)SA method considered here, 
combine atomistic simulations in explicit solvent for the generation of representative 
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biomolecular conformations with an implicit-solvent estimation of the binding free energies, 
in a post-processing step. 


Conceptually, most implicit solvent models decompose the solvation process into three 
sequential steps (Cramer & Truhlar, 1999): i) creation of a cavity in solution to accommodate 
the biomolecule; ii) switching-on dispersion interactions between the biomolecule and 
surrounding medium, while all atomic charges are set to zero; and iii) switching-on the 
biomolecular charges. The solvation free energies of steps i) and ii) are normally assumed to 
be proportional to the SASA of the biomolecule and represent the non-polar contributions 
(Gsasa) to Gsowv in Eq. (6), although the validity of this approximation has been questioned 
for step ii) (Levy et al., 2003). With a positive coefficient of proportionality, an increase in the 
SASA is associated with an unfavorable increase in solvation free energy, which is partly 
accounted for by the tendency of non-polar residues to be solvent-excluded. The equation 
typically used is of the form: 


Gsasa (X) = 7.SASA + B (7) 


with the y and p parameter values dependant on the method and solvation model (PBSA or 
GBSA) used (Rastelli et al., 2010). 


Meanwhile, step iii) calculates the contribution to solvation free energy due to the charge 
/electrostatic interactions of the solute with the surrounding solvent, the polar contributions 
(Gppcp)) to Gsoiv in Eq. (6). In continuum-electrostatics models such as PB and GB, the solute 
is treated as a low-dielectric cavity embedded in a high dielectric medium. The solute 
charges are in the simplest and most common approximation centered on the individual 
atoms. The resulting solvation free energy of a molecule X is expressed as (Simonson, 2003): 


1 


7 > qigjgy OP (8) 


i,jeX 


GPB(GB) (X)= 


where the summation is over all the atomic charges {qi}. The quantity gjPB(GB) is determined 
using the PB model by numerical solution of the Poisson or Poisson-Boltzmann equation 
(depending on the existence of salt), or using the GB model by an analytical expression with 
the functional form (Simonson, 2003; Still et al., 1990): 


-1/n 
1 n r 
Si" zt (= F |; + Bi exp E] (9) 
1] 


The parameters Bij depend on the position (distance from the solute-solvent dielectric 
boundary) of atoms i and j, and the shape of the entire biomolecule; e is the solvent dielectric 
constant, and rj is the distance between i and j. The constants n and A were set to n=2 and 
A=4 in the original formulation of Still and coworkers (Still et al., 1990). 


In the PB model, the solute dielectric constant (£in) affects the computed functions g;PB3 and 
Eq. (8). Meanwhile, in the GB model, the solute dielectric constant drops out from the final 
expression in Eq.(9), due to the approximations used to arrive at an analytic formula. An €in 
value other than 1 can still be used; in this case, the first parenthesis on the right-hand side 
of Eq. (9) becomes (1/e -1/éin) and the GB expression yields the free energy of transferring 
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the solute from an infinite reference medium with dielectric constant £in into solution. A 
careful discussion of this point is in (Bashford & Case, 2000). 


Application of Eq. (8) to a protein:ligand complex (PL) and the dissociated protein (P) and 
ligand (L) yields the electrostatic (polar) solvation free energy contribution to Eq. (3): 


AGpgp) = Grecs) (PL) — Grec) (P) — Grecs (L) (10) 


An advantage of these methods is that they facilitate the decomposition of the total 
solvation free energy into insightful components. Indeed, the summation over atomic 
charges in Eq. (8) implies that the electrostatic free energy of Eq. (10) can be partitioned into 
interaction and desolvation components (Archontis et al., 2001; Hendsch & Tidor, 1999): 


AGpgca) = >. 48 q; + >) 9:85; 1; - >) rot > 2 Jigi "j3 > qig; | (1) 
2 i jep 


ieP,jeL 2; jeP 2 i Fep 


The notation gX implies that the interaction between charges i and j, as determined by the 
function gj, is in general different in the complex, free protein and free ligand. The first term 
on the right-hand side of Eq. (11) is the “interaction term” and arises from direct interactions 
between the protein and ligand charges, which are only present in the complex; the next 
term [the first brace] is a protein “desolvation” term, arising from the replacement of high- 
dielectric solvent by the low-dielectric ligand in the protein vicinity, as well as structural 
relaxation and changes in the charge distribution of the protein. The last term [second brace] 
corresponds to the desolvation of the ligand. 


The “interaction term” can be further decomposed into contributions from specific residues 
(Archontis et al., 2001; Hendsch & Tidor 1999). For example, the contribution of a protein 
residue R to this term is given by the following expression 


AG cn = ; 2 ligi qj (12) 


ieR,jeL 


These components provide useful insights on the origin of the binding free energy values. 
They can help interpret differences in binding affinities for a series of related complexes and 
guide the design of modified proteins or ligands. For example, in a study of amino acid 
binding to native and mutant aspartyl-tRNA synthetase, residue decomposition identified 
the protein residues discriminating between the cognate ligand (aspartic acid) and the 
analogue asparagine (Archontis et al., 2001). In another study of RNAse A recognition by 
dinucleotidic inhibitors, a similar decomposition attributed the stronger binding of the most 
potent inhibitor to interactions with two active site lysines (Polydoridis et al., 2007). 


Finally, Hou and co-workers very recently evaluated the performance of MM-GBSA and 
MM-PBSA for predicting binding free energies based on molecular dynamics simulations 
(Hou et al., 2011a). Their results showed that MM-PBSA performed better in calculating 
absolute binding free energies compared to MM-GBSA but not necessarily for the relative 
binding free energies, sufficient for most applications in computational drug design. 
Interestingly, in a study of the accuracy of continuum solvation models for drug-like 
molecules, GB methods typically were more stable and gave more accurate results that the 
widely used PB methods (Kongsted et al., 2009). 
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3.3 Selection of MD trajectory conformations/snapshots 


A recent study for the binding of seven biotin analogues to avidin suggested that to obtain 
statistically converged MM-GBSA results, several independent simulations each with 
sampling times of 20-200 ps (averaging the results) is more effective than a single long 
simulation (Genheden & Ryde, 2010). ‘Single-trajectory’ simulations of the complex are 
generally sufficiently accurate for most applications, and while MD simulation length does 
have an obvious impact on the accuracy of predictions, longer MD simulations doesn’t 
necessarily mean better predictions (Hou et al., 2011a). For the calculations of the AEmm and 
AGsoiv terms, a large ensemble (e.g. several hundred) conformations are typically extracted 
in small intervals from the single MD trajectory of the complex. Alternatively, averaging 
over a select few receptor-ligand binding conformations from the MD trajectory via 
clustering has proved effective (Section 4.1), as well as more time efficient (Hayes et al., 
2011). MM-GB(PS)SA calculations on single (minimized) structures has also recently been 
proposed and validated (Kuhn et al., 2005; Rastelli et al., 2010), but not necessarily for 
structures generated from MD simulations (Section 4.2). Meanwhile, for the entropy term 
calculated using normal mode analysis, fewer snapshots (typically less than a 100) are 
employed, due to the computational cost involved. As already highlighted (Section 3.1), a 
larger number of snapshots may be required for more stable and accurate predictions. These 
calculations, however, are computationally expensive and often not feasible with limited 
computational resources. Consequently, neglect of the entropy term can in some cases lead 
to sufficient or more accurate predictions for ranking of ligand binding affinities in certain 
macromolecular systems (Hayes et al., 2011; Hou et al., 2011a; Rastelli et al., 2010). 


3.4 Limitations & caveats of MM-GB(PB)SA calculations 


MM-GB(PB)SA methods are widely recognised as valuable tools in CADD applications. 
However, as with any method they have limitations and caveats, which need to be 
considered. First, while useful for ranking relative ligand binding affinities, these methods 
lack the required accuracy for absolute binding free energy predictions (Hou et al., 2011a; 
Singh & Warshel, 2010).The inclusion of entropic contributions brings the MM-GB(PB)SA 
values somewhat closer to experimental absolute affinities (Gilson & Zhou, 2007). However, 
such entropic terms are costly and contain large uncertainties. Force-field inconsistencies 
may also be an issue: PB and GB results depend strongly on adequate atomic charges and 
van der Waals radii, which are often optimized for MD simulations. The MM-GB(PB)SA 
results may be influenced by system-dependent properties, such as the features of the 
binding site, the extent of protein and ligand conformational relaxation upon association, 
and the protein and ligand charge distribution (Kuhn et al., 2005; Hou et al., 2011a). 
Continuum electrostatics models ignore the molecular structure of the solvent; in some cases 
this might affect the results, particularly when key receptor-ligand interactions are bridged 
by water molecules, c.f. Section 4.1 (Hayes et al., 2011). Furthermore, the value of the 
protein/ligand dielectric constant is empirically chosen, and takes into account not only the 
protein and ligand structural relaxation, but also other error-introducing factors such as the 
ones mentioned above (Archontis & Simonson, 2005; Schutz & Warshel, 2001). Hou and co- 
workers suggested in a recent MM-PBSA study that the use of £in = 4 for a highly charged 
protein-ligand binding interface, £in = 2 for a moderately charged binding interface and £in = 
1 for a hydrophobic binding interface may improve ligand ranking (Hou et al., 2011a). The 
lack of a consistent optimum dielectric constant for MM-PBSA calculations has been noted 
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by other workers (see for e.g. (Aleksandrov et al. 2010)), although generally a value ein =4 
often gives satisfactory results (Aleksandrov et al., 2010; Archontis et al., 2001; Thompson et 
al., 2006). As a final note, MM-GB(PB)SA calculations require some degree of user expertise 
and planning, from the initial set-up and analysis of the MD simulations through to the 
binding free energy calculations themselves. 


3.5 Variations & extensions of MM-GB(PB)SA 


While molecular docking algorithms are computationally efficient methods used to screen a 
large number of ligands against a given target in reasonable time, generating and post- 
processing MD ensembles for more than a few receptor-ligand structures in MM-GB(PB)SA 
calculations is currently impractical. Recently, however, MM-GB(PB)SA methods are 
receiving plaudits as post-docking methods in virtual screening experiments. Post- 
processing of single docking poses using MM-GB(PS)SA algorithms can improve 
correlations between predicted and experimental binding affinities with a number of 
successes reported, c.f. for example (Du et al., 2011; Hou et al., 2011b; Lyne et al.; 2006). This 
is consistent with the previously mentioned discovery that select single receptor-ligand 
structures can prove as accurate as sampling over large numbers of MD trajectory snapshots 
in MM-GB(PB)SA applications (Kuhn et al., 2005; Rastelli et al., 2010). Post-docking MM- 
GBSA is implemented in Schrödinger software using the program Prime, with options to 
include receptor and ligand flexibility; the entropy term is neglected by default. Other recent 
extensions of MM-PBSA exploit quantum mechanics (QM) methods in QM/MM-PBSA 
calculations (Grater et al., 2005; Manta et al., 2012; Wang & Wong, 2007). Here, a “hybrid” 
gas phase energy term (Egm/mm) effectively replaces the pure molecular mechanics energy 
(Emm) term in Eq. (5). Representing the ligand by QM has the advantage, for example, to 
eliminate a frequently encountered problem of deficient ligand forcefield parameters. 
However, calculations of this sort are significantly more expensive than MM-GB(PB)SA, and 
therefore are typically only viable for binding predictions on relatively few ligands. 


4. Recent applications of MM-GB(PS)SA 


In the present section, we will review recent examples of the use of MM-GB(PS)SA 
calculations for calculating ligand binding free energies highlighting both successes and 
limitations. 


4.1 Example 1: Phosphorylase kinase ATP-binding site inhibitors 


With an aim towards glycogenolysis control in type 2 diabetes, indirubin (ICso> 50 uM), 
indirubin-3’-oxime (ICs = 144 nM), KT5720 (Kj = 18.4 nM) and staurosporine (K; = 0.37 nM) 
were investigated as phosphorylase kinase (PhKytrnc) ATP-binding site inhibitors (Hayes et 
al., 2011). 


Due to the lack of experimental structural information for binding of these ligands (Figure 2), 
MD simulations in explicit solvent using Desmond 2.0 (Bowers et al., 2006) were performed 
for each receptor-inhibitor complex, with 4 ns production runs following an initial 
equilibration period. A computationally-efficient multiple timestep RESPA integration 
algorithm was employed with timesteps of 2, 2 and 6 fs for bonded, “near” and “far” 
non-bonded interactions, respectively. Energy and trajectory atomic coordinate data 
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KT5720 Stauresporine 


Fig. 2. The ATP-binding site inhibitors of phosphorylase kinase, as studied in Ref. (Hayes et 
al., 2011). 


were recorded every 1.2 and 2.1 ps, respectively. For the trajectory analysis preceding the 
‘single-trajectory’ based MM-GBSA calculations, both VMD (Humphrey et al., 1996) and 
Desmonds’ Maestro simulation analysis tools were employed. The extracted MD trajectory 
binding site conformations (inhibitors + residues within 7 A) of each complex were 
clustered into 10 groups based on atomic root-mean-square-distances (RMSDs). Every 
second frame (snapshot) of the last 3 ns of the production run (analysis phase) was used in 
the hierarchial clustering algorithm employed by the Desmond Maestro’s Trajectory 
Clustering module. The representative complex of each of the 10 binding site cluster families 
(waters and counterions deleted) was then used in MM-GBSA calculations of binding free 
energies using Eq. (3). Schrédinger software with the MacroModel 9.7 Embrace module was 
used for the AEmm and AGsow calculations, while the entropy change, AS, was calculated for 
the minimized representatives using Rigid Rotor Harmonic Oscillator (RRHO) calculations. 
Using this algorithm, the change in vibrational, rotational and translational (VRT) entropy 
of the ligands on binding was estimated. Finally, the thermodynamic average AGping values 
(300 K) were estimated using the corresponding values for the 10 cluster representatives: 


10 
AGpind = Èp AG pina (i) (13) 
i=1 


The sum i was over the 10 cluster representatives, with p; defined as the cluster frequency: 


Pi = 
N total 


where N; the number of frames in cluster i, and Not the total number of frames. 
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Fig. 3. Predicted binding of staurosporine (gray) at the ATP-binding site of PhKytrnc. Key 
interacting residues are colored by type - polar/charged residues shown in red (D104, E110 
and E153) and hydrophobic in green (M106). Hydrogen bonds are formed with M106 and 
D104 backbones (hinge region), E110 and E153. 


The relative binding affinities from the final AGping values were generally in agreement with 
experiment, except the rankings of KT5720 and staurosporine (representative complex 
shown in Figure 3) were reversed by 0.7 kcal/mol. The discrepancy was accounted for by 
neglect of certain key contributions to the binding free energies using the MM-GBSA 
algorithm. Notably, accounting for and estimating the loss of the conformational entropy for 
the more flexible KT5720 ligand yielded AGping values 1.2 - 2.6 kcal/mol in favour of 
staurosporine binding and hence in agreement with the experimental rankings. Further, 
whereas staurosporine had no key receptor-ligand bridging waters, the MD simulations 
revealed a key role of bridging waters for KT5720 binding. The entropy loss associated with 
a bound water molecule in a protein-ligand complex is sometimes important (but typically 
neglected in MM-GB(PB)SA calculations), with an upper bound free energy cost of 2 
kcal/mol at 300 K suggested (Dunitz, 1994). 


4.2 Example 2: Interpretation of species specificity of compstatin, a peptidic inhibitor 
of the complement system 


The complement system provides the first line of defense against the invasion of foreign 
pathogens (Mastellos et al., 2003). Its inappropriate activation may cause or aggravate 
several pathological conditions, including asthma, macular degeneration, rheumatoid 
arthritis, and rejection of xenotransplantation. The 13-residue compstatin is a promising 
candidate for the therapeutic treatment of unregulated complement activation (Janssen et 
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al., 2007; Morikis & Lambris, 2005). The conformation of the human C3 - compstatin 
complex is shown in Figure 4. Compstatin interacts closely with four protein sectors 
indicated by thick yellow tubes. An important conclusion, drawn from experimental studies 
with a large number of species, was that compstatin inhibits the key complement 
component protein C3 from several primate species, but is inactive against the 
corresponding protein from lower mammals, precluding the testing of compstatin-based 
analogues in animal models (Sahu et al., 2003). To understand this species specificity, 
Tamamis and co-workers compared the stabilities of compstatin complexes with human or 
rat C3 (Tamamis et al., 2010) by atomistic MD simulations and an MM-GBSA analysis. In the 
simulations of the rat C3 complex, specific protein sectors near the compstatin binding site 
underwent reproducible localized displacements, which eliminated or weakened critical 
protein-ligand interactions. In agreement with the simulations and the lack of compstatin 
activity against rat C3,a MM-GBSA analysis estimated the binding free energy of the human 
C3 complex to be stronger by -9 kcal/mol relative to the rat C3 complex, in the “single- 
trajectory” approximation. If protein and ligand relaxation were taken into account by a 
“three-trajectory” approximation, the relative binding free energy increased (in absolute 
value) to -19 kcal/mol. Thus, in this system the neglect of relaxation effects introduced a 
significant error, even though the qualitative conclusion was correct in both cases. 


Fig. 4. Representation of the human C3c-compstatin complex. The compstatin main chain is 
shown ina red tube and licorice representation. The protein C3c is shown as a gray tube; 
only residues 329-534 and 607-620 are included, corresponding to the simulation system of 
Ref. (Tamamis et al, 2010). The yellow thick tubes show four protein sectors in proximity to 
compstatin (488-492, 454-462, 344-349 and 388-393, from left to right). The right-most sector 
388-393 moves away from compstatin in the simulations of non-primate C3 complexes 
(Tamamis et al, 2010; Tamamis et al, 2011). 
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Guided by the above MM-GBSA study, subsequent simulations investigated the stabilities 
of compstatin complexes with “transgenic” variants of mouse C3, containing 6-9 
substitutions from the human C3 sequence near the compstatin binding site. The MM-GBSA 
binding affinities of the resulting transgenic complexes were comparable to the one of the 
human C3 complex, and by -8 to -9 kcal/mol stronger relative to the mouse C3:compstatin 
complex (Tamamis et al., 2011). More recent simulations have investigated the affinities of a 
large number of human and mouse or rat C3 complexes with compstatin analogues, 
producing promising results (Tamamis et al., 2012). Thus, the combined study of a series of 
related protein-ligand complexes by atomistic simulations and an efficient evaluation of the 
corresponding affinities, such as in the MM-GB(PB)SA formulation, provides a powerful 
way to design new proteins or ligands. 


4.3 Example 3: Fast predictions of binding free energies using MM-GB(PB)SA 


Rastelli and co-workers (Rastelli et al., 2010) explored the reliability of using a single energy 
minimized receptor-ligand complex in MM-GB(PB)SA calculations to estimate ligand 
binding affinities for a series of structurally diverse inhibitors (Figure 5) of Plasmodium 
falciparum DHFR (Figure 6) with known binding modes and affinities. 
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Fig. 5. Sample of the structurally diverse P(DHFR inhibitors studied in Ref. (Rastelli et al., 2010). 
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Fig. 6. Wild-type Pf(DHFR in complex with NADPH (yellow) and the inhibitor WR92210 
(green). This protein structure taken from PDB code 1J3I was used for the calculations in 
Ref. (Rastelli et al., 2010) and as described in Example 3. 


They obtained excellent correlations between MM-PBSA or MM-GBSA binding affinities 
and experimental values, similar to those obtained after averaging over multiple snapshots 
from periodic boundary MD simulations in explicit water in the traditional sense, but with 
significant savings on computational time and effort. Different methods were used for 
generating the structures for the MM-GB(PB)SA calculations from minimizations in implicit 
and explicit solvent models, to minimization using a distance dependent dielectric function, 
and finally minimization followed by a short MD simulation and then re-minimization. The 
approach has been implemented in an automated workflow called BEAR (Binding 
Estimation After Refinement) which produces both MM-GBSA and MM-PBSA predictions 
of binding free energies, and is fast enough to be suitable for virtual screening applications 
(Degliesposti et al., 2011; Rastelli et al., 2009). 


5. Conclusion 


MM-GBSA and MM-PBSA are computationally efficient, end-point free energy methods 
that have been widely used to study protein-ligand binding affinities. Even though they lack 
the sound theoretical foundations of recently developed computationally demanding 
absolute-affinity free-energy methods (Boresch et al., 2003; Deng & Roux, 2009; Gilson & 
Zhou, 2007; Lee & Olsen, 2006), their connection with statistical thermodynamics has been 
established (Swanson et al., 2004). Due to the approximations inherent in MM-GB(PB)SA 
methods, they are more applicable for ranking (“scoring”) of ligand binding affinities rather 
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than to quantitatively predicted absolute binding free energies. They should be regarded as 
approximate, as they combine a molecular mechanics energy function with a continuum- 
electrostatics treatment of solvation effects; they include solute conformational entropy 
effects in an approximate manner (Singh & Warshel, 2010); and ignore the solvent molecular 
structure. Accurate incorporation of solute entropy (Foloppe & Hubbard, 2006) and solvent 
effects in binding affinity calculations is challenging, but future extensions and development 
of MM-GB(PB)SA methods will undoubtedly serve to address these limitations. 
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